TRAP-TANDEM: data-driven extraction of temporal features from speech - Automatic Speech Recognition and Understanding, 2003. ASRU '03. 2003 IEEE Workshop on
نویسنده
چکیده
Conventional features in automatic recognition of speech describe instantaneous shape of a short-term spectrum of speech. The TRAP-TANDEM features describe likelihood of sub-word classes at a given time instant, derived from temporal trajectories of band-limited spectral densities in the vicinity of the given instant. The paper presents some rationale behind the data-driven TRAP-TANDEM approach, briefly describes the technique, points to relevant publications and summarizes results achieved so far.
منابع مشابه
Trap-tandem: Data-driven Extraction of Temporal Features from Speech
Conventional features in automatic recognition of speech describe instantaneous shape of a short-term spectrum of speech. The TRAP-TANDEM features describe likelihood of sub-word classes at a given time instant, derived from temporal trajectories of band-limited spectral densities in the vicinity of the given instant. The paper presents some rationale behind the data-driven TRAP-TANDEM approach...
متن کاملResults from a survey of attendees at ASRU 1997 and 2003
In 1997 the author conducted a survey at the IEEE workshop on ‘Automatic Speech Recognition and Understanding’ (ASRU) in which attendees were offered a set of twelve putative future events to which they were asked to assign a date. Six years later at ASRU’2003, the author repeated the survey with the addition of eight additional items. This paper presents the combined results from both surveys.
متن کاملProgress and Prospects for Speech Technology: Results from Three Sexennial Surveys
In 1997, and again in 2003, the author was invited to conduct a survey at the IEEE workshop on ‘Automatic Speech Recognition and Understanding’ (ASRU) in which attendees were offered a set of statements about putative future events relating to progress in various aspects of speech technology R&D. The task of the respondents was to assign a date to each possible event. The 1997 and 2003 results ...
متن کاملSpeech Emotion Recognition Using Scalogram Based Deep Structure
Speech Emotion Recognition (SER) is an important part of speech-based Human-Computer Interface (HCI) applications. Previous SER methods rely on the extraction of features and training an appropriate classifier. However, most of those features can be affected by emotionally irrelevant factors such as gender, speaking styles and environment. Here, an SER method has been proposed based on a concat...
متن کاملA spoken dialogue system for conference/workshop services
This paper describes our progress towards building a telephony-based spoken dialogue system for workshop/conference services. A mixed-initiative dialogue system has been developed that is engineered to o er users natural interaction with the system, ease-of-use and robustness towards ambiguous requests and machine errors. A prototype system, known as W99, is described in this paper which was de...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2004